-
-
Couldn't load subscription status.
- Fork 10.8k
[Model] Decouple glm4v #22751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Decouple glm4v #22751
Conversation
Signed-off-by: Jee Jee Li <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively decouples Glm4vForConditionalGeneration and Glm4vMoeForConditionalGeneration to address differences in their configurations and improve maintainability. The changes to the model registry, documentation, and packed_modules_mapping are appropriate. I have one suggestion to make the code more robust by explicitly disabling LoRA support for Glm4vMoeForConditionalGeneration to prevent potential runtime errors, aligning the code's behavior with the stated intention in the documentation and pull request description.
| packed_modules_mapping = { | ||
| "qkv_proj": [ | ||
| "q_proj", | ||
| "k_proj", | ||
| "v_proj", | ||
| ], | ||
| "gate_up_proj": [ | ||
| "gate_proj", | ||
| "up_proj", | ||
| ], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pull request description and documentation state that LoRA is not supported for Glm4vMoeForConditionalGeneration due to a bug. However, the class currently inherits SupportsLoRA from its parent, which will cause vLLM to attempt to apply LoRA adapters if provided, leading to a runtime error. To prevent this, I suggest adding an __init__ method that explicitly checks for and disallows LoRA configuration for this model. This will provide a clear error message to users and make the code's behavior consistent with the documentation.
| packed_modules_mapping = { | |
| "qkv_proj": [ | |
| "q_proj", | |
| "k_proj", | |
| "v_proj", | |
| ], | |
| "gate_up_proj": [ | |
| "gate_proj", | |
| "up_proj", | |
| ], | |
| } | |
| packed_modules_mapping = { | |
| "qkv_proj": [ | |
| "q_proj", | |
| "k_proj", | |
| "v_proj", | |
| ], | |
| "gate_up_proj": [ | |
| "gate_proj", | |
| "up_proj", | |
| ], | |
| } | |
| def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""): | |
| # LoRA is not supported for this model yet. | |
| if vllm_config.lora_config: | |
| raise NotImplementedError( | |
| "LoRA is not currently supported for " | |
| "Glm4vMoeForConditionalGeneration." | |
| ) | |
| super().__init__(vllm_config=vllm_config, prefix=prefix) |
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Glm4vForConditionalGenerationandGlm4vMoeForConditionalGeneration. On one hand, theirpacked_modules_mappinghave some differences(the root cause of [Bug]: GLM-4.1V lora trained model reports target_module mismatch error #22077), and on the other hand, it also facilitates the maintainability of the model code.Glm4vMoeForConditionalGenerationdoes not support LoRA and will raise the following error. I have removed the LoRA label for now and will investigate the issue ASAPcc @zRzRzRzRzRzRzR
Test Plan
Test Result
(Optional) Documentation Update